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On the Partial Correlation Ratio. 
By Karl Pearson, F.RS. 

(Eeceived May 31, 1915.) 

(1) In a paper communicated to the Eoyal Society in 1903* I gave very 
briefly in a footnote the properties of the correlation ratio. These properties 
were discussed more at length in my memoir, " On the General Theory of 
Skew Correlation and Non-linear Regression,"-)- published in 1905. The 
two papers dealt only with the total correlation ratio, or the relation between 
two variates without consideration of anv other correlated variates. The 
introduction of the correlation ratio enabled the measure of the relationship 
between two variates to be expressed by a single number, measuring its- 
total intensity, in cases where the regression line was of any form. The 
ratio passed into the usual correlation coefficient when the regression line 
became straight. This correlation ratio has been generally accepted by 
statisticians as a useful measure of relationship in cases of skew correlation 
and non-linear regression. Shortly after the appearance of the above 
memoirs I generalised this coefficient in a manner comparable with the 
generalisation of the coefficient of correlation, namely, by the definitions of 
the multiple correlation ratio and of the partial correlation ratio. These 
ratios correspond to the multiple correlation coefficient and the partial 
correlation coefficient in multiple linear regression. Their importance is 
very considerable, as they enable us to measure the intensity of association 
between two variates when other correlated variates are considered as 
constant without any assumption that the regression is linear, still less that 
the frequencies follow the normal (or Laplace-Gaussian) surface. I had not 
intended to discuss the results of the present paper before the probable 
errors had been provided, but the recent revival of interest in skew regression, 
and its fundamental importance in all higher statistical inquiry, justifies, at 
least, the publication of those formulae which are fundamental to the subject. 

(2) I deal first with the problem of three variates, although the extension 
to any number is not hard to make. 

Let these three variates be x, y, and z ; further, let the symbol z rj ytX signify 
the partial correlation ratio of y on x for a constant z. Similarly, z rj Xtr 
signifies the partial correlation ratio of x on y for a constant z. 

* ' Roy. Soc. Proc.,' vol. 71, pp. 303-4. 

+ " Mathematical Contributions to the Theory of Evolution/' XIV, ' Drapers'" 
Company Research Memoirs ' (Cambridge University Press). 
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If we take for a given value of z } say z Pi the corresponding x, y population 
we can determine in the usual manner the special correlation ratio of y on x 
for this selected population. It may be represented by z /iy.x\ then by 
definition of a correlation ratio : — 



/I m 2 \ — ^ar^y \z p % xy\y — z p yx) } 
\L— z T) y tX ) — £ _ £ 



Here the subscript z p affixed means limitation to the population of x and y 
selected by giving z the constant value z p . Thus z n is the whole frequency 
of such a population, z y x the mean value for such a population of the array 
of y's corresponding to a given x, z G y the standard deviation of all y's in such 
a population. The summations S^, S y are also to be for this limited field, 
while z rixy is any cell frequency x, y in the same field. 

I now define my partial correlation coefficient z 7j y%x by stating that 
{l— z rf y , x ) x mean value of Zj cr y 2 shall be taken as the weighted mean of 
such expressions as (1— - zp 7f y . x ) x yr/ for all values of z. But the mean 
value of Zp a 2 y = cr y 2 (l — rfy.z) by definition. Hence, 

if N" be the total population, 

= S*S*S y { Zp n xy (y - Zp y x ) 2 } /K 

Now * $* is clearly the mean value of y for the group of y's corresponding 
to a constant z and #, and may be written y zx ; we may accordingly read our 
triple summation as 

S^SsSy {n^iy-y + y-y^) 2 }/^ = S z S*S y {n xyz (y—y) 2 }/N 

+ S z S x S y {n^ (y~y xz ) 2 }/N+2S z S x S y {n xyz (y-y)(y~y xz )}/?$. 

The first of these sums is clearly <r y 2 . In the third of these sums the factor 
changing with y is n xyz {y—y) i and summed for y this equals 

Sy.{ n xyz(3J—y)} = n xz (y xz -y). 

Hence the third sum equals 

-2S z S x {n xz (y-y xz f}/N, 

which is precisely double the second sum after summing for y. Thus our 

triple summation 

= <r y 2 — S z $ x {n a z{y—y xz y i }/'N. 

I write the last summation 

-£1 y . xz&y • 

H y . xz is comparable with rj y . x ; it is the correlation ratio — not of y on 
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arrays of x, but of y on arrays of x with z. It must be carefully distinguished 
from zVy.x, which is a true partial correlation ratio. We have, 

\I ZV y*x)\± Vy'ZjCTy = (Ty ±1 y, XZ CTy , 

TJ2 __^,2 

nv „2 — - - y'xz V y>z 

ui z'l y*x — ~ — — — • 

1 7] y, Z 

To test this result suppose the regressions to become linear, then rj ytZ 
becomes the correlation coefficient r yZi and H y . xz becomes Ey.^, the multiple 
correlation coefficient between y and x with z, which is well known to be 
given by 

TD2 ' yxi' yz ^ ' y x' y z' xz 

SXy, xz — — -—— . 

± ' xz 
yx^ ' yz & * yx' yz' xz „„2 

Hence zV 2 y.x= 1— r 2 . 






x ' y* 



' yx 'xz'yz) 



Or , % . 2 = Vs*_l**JlL 



v/(l-^)(l-r^)' 



— z^yz? 



that is to say the partial correlation ratio becomes in this case, as it should do, 
the partial correlation coefficient. 

Hy. xz is a " multiple correlation ratio," just as ^ y , xz is a "multiple corre- 
lation coefficient." Just as the partial correlation coefficient can always be 
deduced from the formula 




T?^ iY>a 

zT y X = ^/-Ky-** T y* 



so the " partial correlation ratio " can always be found from 



zVy 



X 



/H 2 —r? 

= a/ y - xz i y < z . 



1—V 2 y.z 



In this lies the importance of the determination of the expressions for the 
value of Hy. X z' 

If x and z be given by classes or categories and y be a quantitative variate, 
then Hy. xz can clearly be found from 

TT2 _ $zSx {n xz (y — y xz ) 2 } 
ny. xz _ -, 

and 7)y tX will also be ascertainable. 
For example, we might ask for the relationship between convictions for 
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drunkenness per year of adult life and mental capacity for a constant grade 
of health. The above formula for the partial correlation ratio in terms of the 
multiple correlation ratio was deduced by me some years ago as suitable for deal- 
ing with certain variates in criminology. It is laborious in use, owing to the work 
needful to evaluate numerically H^.^ when the grouping is at all fine. Quite 
recently Mr. Isserlis, in a very valuable memoir,* has investigated some of the- 
conditions under which the determination of TL y . xz may be thrown back 
on quantities like rj x , y9 r} Xmg , %. z , y z . y , r^, r XZy r yz , etc., which depend solely 
on paired variates, i.e., on the "marginal total tables" of the "correlation (or 
contingency) solid." This is an important conception and corresponds to the- 
expression of the multiple correlation coefficients ~R y . X2 in terms of the total 
coefficients r^, r yz , r zx . 

(3) It is clear that the six partial correlation ratios x rj y . z , xVz.y, yVz.xr 
yVx.z, zVy.x, and z r] x , y are not independent, but connected with each other, pair 
and pair, and with the three multiple correlation ratios, H y . xz , tt z , yX) H x . zy , 
by the equations, 



1 — 7,2 1—7)2 

-»- X'l y > Z - 1 - Z'l y .X 




1 — H 2 

-*- XJ - y . xz 


l — rfy.z l—V 2 y-x 


(1- 


-rf y .z)(l—n 2 y . x ) 


*-"~~yV z.x -1 xV z.y 




1-HV,* 


-1 V z.x J- V z.y 


(i- 


-ri 2 z . x ){l—tf,. y ) 


-*- zV x.y -1 yV x.z 




1-H 2 *.^ 



(*} 



1—V 2 x.y 1—V 2 x.z (l—V 2 x.y)(l—V 2 x.z)' 

Thus the labour of calculating the six partial correlation ratios reduces to 
that of finding the six total V s an d the three H's or multiple correlation 
ratios. 

(4) In the usual notation 3.4... m ri 2 represents the partial correlation 
coefficient of the first and second variates for constant 3, 4, ..., m variates, 
and Ei. 2 34...m represents the multiple correlation coefficient of the first variate* 
with regard to the combined 2, 3, 4, ..., m variates. If A be the determinant 



1, na, na, ••• n 

r 2 i, 1, r 2 s, ... r 2 

ni, r 32, ■ 1, ..• Tfrn 



m 

m 



^ml) ^m2> y*m3> • • • -1 

where r ss > = r s > s is the correlation coefficient of the s and s' variates, and A pq , 

* ' Biometrika,' vol. 10, pp. 391-411. 
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be the minor of the constituent r pq , A m t p/q > the second minor of the con- 
stituents Tpg and r vW , then it is well known that* 

34...m?'l2 = ' 



pq ctuu f p>q> 



v/ ( An A23) ' 
&ndf 1 — E 2 i.234,.,w = -r— • 



Thus we have 



2 A11A22 — A12 2 

34,..m^l2 =-- 



A11A22 

But by a familiar theorem in determinants 

Aim = (AiiAi2-Ai 2 2 )/A, 

hence 1 — u.., m r 12 2 = ' n ~ = ( n ', 

ZA11ZA22 A22/ ^2211 

— — _ *_; 2 3...m 

1 P2 > 

- 1 - -^ 1- 3 . m 

= ^-""-^ 2-13.. .m 

since the left-hand side is symmetrical in 1 and 2. By successive reduction 
we can clearly write^ 

1 „ 2 _ 1— E, 1-23 ..to 

J- — 34...m?12 - 



Ov ' 



(1— 45...mn3 2 )(l— 56...mn4 2 )(l— 67...mn5 2 ) ... (1— ^lm 2 ) 

which allows us to express the partial correlation coefficient of the (m— 2)th 
order in terms of the multiple coefficient of correlation of the (m — l)th 
order, and partial correlation coefficients of the (m— 3)th, (m— -4)th, 
(m — 5)th, ..., and zero orders. 

Now practically identical formulae hold connecting the partial correlation 
ratio of the (m — 2)th order with multiple correlation ratios, except that 
1 and 2 are no longer interchangeable.! We have, in fact 

-| m 2 l — ±1 i .23„.m / p\ 

-1-"— S±...mV 1-2 — z U2 , {p) 

1— ±1 i.34...m 

or by successive reduction we deduce 

1 TJ2 

-| 2 -O- l » 23...m / \ 

(1— 45.. .m^ 1.3)(1— 56...j»lri.4)(l— 61... mV l'S) ••• (1 — Vl-m) 

* ' Phil. Trans.,' A, vol. 200, p. 10 (1902), Eqn. xxvii. 

t * Biometrika,' vol. 8, p. 439, eqn. vi. 

J This result was first given by Yule, ' Koy. Soc. Proc., 5 A, vol. 79, p. 189 (1907), 
Eqn. (17). 

§ In skew regression rj u . v is of course not equal to rj v . w although numerically both lie 
between r uv = r vu and unity. 
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Since on the left and in the numerator of the right any of the variates 
3, 4, ..., m, are interchangeable without changing the values of the left-hand 
side or the numerator, it follows that an immense variety of forms can be 
given to the partial correlation ratios in the denominator. Further, no one 
of these partial correlation ratios can be perfect, i.e. equal unity, without 
the multiple correlation ratio Hi . 23... m being also perfect, and this remark 
applies to every multiple correlation ratio of a higher order, i.e. if 
8(*+i)... m yi . (*-i) = 1 then will Hi . p{p +i)... s ... m = l,p being >s. 

It will be clear that our results (a) for the partial correlation ratios of the 
first order are only very special cases of (ft), or rather (7), above. 

To demonstrate (ft) is little more than a matter of the definitions of our 
high order partial correlation ratio and our multiple correlation ratio. 

By definition 

TJ2 — S^S^... S Xm { n 23...m(%l— #1.234...m) 2 } 

-EL 1.23...W — :r= 5 > 

x\ being the general mean of the first variate and xi . 234.. .m the partial mean 
for constant 2nd, 3rd, 4th, ..., mth variates, whence we can deduce* 

1 — TT 2 . 00 — S * A,- • &x m {^23 ..m (#1 —^1 • 23...m) 2 } 
-L ■*•■>• l*23...m — — - — - ~ -. > 

or cr^ 2 (1 — Hi . 23...m) is the mean square standard deviation of arrays of X\ for 
constant values of the m — 1 other variates #2, #3, ...,#w But we have 
originally defined ?;i . 2 by saying that a{* (1 — 971 . 2 2 ) shall be the mean square 
standard deviation of the arrays of the first variate which correspond to 
constant values of the second variate. Now let these arrays of the first 
variate be still further limited by being taken for constant values of the 3rd 
to the mth variates. Then consider the expression : 

34...mO"l 2 (1 — 34...m^ 2 l • 2), 

where u...mPi? is the mean squared standard deviation of the first variate for 
constant 3rd to the mth variates. This by analogous definition is the mean 
squared standard deviation of the 1st on the 2nd variate for constant 3rd to 
the mth variates, i.e., for 1st on 2nd to mth variate. 

= S X2 8 Xs ...S Xm {n 2 3... m (Xi— 5l.23...m) 2 }/N" 
= <7i 2 (l — H 2 i.23...ro). 

TVrnc 1 ♦, 2 _ Ori 2 (l — H 2 i.23 m) 

inUS 1— 34...m^l-2 — o * 

34...mO"l 

But clearly 34...m^i 2 = cri 2 (l — H 2 i.34... m ), and there results 

1 /k,2 — ^-~"-^- 2 l-234...m (q\ 

1 ~ 34. m^/ 1-2 — ~Z xjTj • \P) 

-L. — -tl l-34...m 

* Isserlis, * Biometrika,' vol. 10, p. 393. 
VOL. XCI. — A. 2 R 
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Indeed, the two theorems, that just given and the previous one, 

1 „, 2 __ J- ~" ImL- 234... m 

-L"~34...m'12 — Z ™ > 

1 — K i.3.4...m 

are, verbally expressed, identities, the latter having relation to standard 
deviations measured from planes in higher dimensioned space, i.e. to multiple 
" linear " regression— and the former to standard deviations measured from 
curved surfaces in higher dimensioned space, i.e. to multiple " skew " regression. 
The one theorem passes into the other as the skew regression surfaces become 
planes. 

Unfortunately while the rule for finding Hi. 2 3...m is quite simple, the 
arithmetic is very laborious. The next step in advance must be such a study 
of skew regression surfaces that we shall learn how to express the multiple 
correlation ratio in terms of total correlation ratios as we know how to 
express the multiple correlation coefficient in terms of total correlation 
coefficients. The first step in this direction has recently been taken by 
Isserlis in the memoir cited above. 



On a Spectrum Associated with Cai^on, in Relation to the 

Wolf- Ray el Stars. 

By Thomas E. Merton, B.Sc (Oxon.), Lecturer in Spectroscopy at University 

of London, King's College. 

(Communicated by A. Fowler, F.E.S. Eeceived June 3, 1915.) 

[Plate 7.] 

The comprehensive investigations of Campbell* have shown that the spectra 
of the Wolf-Eayet stars contain in addition to lines due to hydrogen and 
helium, a number of lines which have not been identified with any spectrum 
which has hitherto been produced in the laboratory. Owing to the very 
diffuse character of the lines in the spectra of the Wolf-Eayet stars, accurate 
measurements of wave-length are impossible, and any identification of the 
lines with a terrestrial spectrum must, therefore, depend on the apparent 
coincidence of a relatively large number of lines with the spectrum produced 
in the laboratory. 

* ' Astronomy and Astrophysics,' vol. 13, p. 448 (1894). 



